-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[LifetimeSafety] Refactor FactGenerator to use RecursiveASTVisitor #153661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[LifetimeSafety] Refactor FactGenerator to use RecursiveASTVisitor #153661
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
e61817a
to
43d28dc
Compare
43d28dc
to
1877937
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
1877937
to
f3ff755
Compare
f3ff755
to
cdc9849
Compare
f590db6
to
894fa21
Compare
cdc9849
to
3bb7fca
Compare
@llvm/pr-subscribers-clang-analysis Author: Utkarsh Saxena (usx95) ChangesThis change adds a Fixes #153592 Full diff: https://github.com/llvm/llvm-project/pull/153661.diff 2 Files Affected:
diff --git a/clang/lib/Analysis/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety.cpp
index c762f63c45e09..86d9517dde45c 100644
--- a/clang/lib/Analysis/LifetimeSafety.cpp
+++ b/clang/lib/Analysis/LifetimeSafety.cpp
@@ -396,6 +396,7 @@ class FactGenerator : public ConstStmtVisitor<FactGenerator> {
// initializations and destructions are processed in the correct sequence.
for (const CFGBlock *Block : *AC.getAnalysis<PostOrderCFGView>()) {
CurrentBlockFacts.clear();
+ VisitedStmts.clear();
for (unsigned I = 0; I < Block->size(); ++I) {
const CFGElement &Element = Block->Elements[I];
if (std::optional<CFGStmt> CS = Element.getAs<CFGStmt>())
@@ -408,6 +409,12 @@ class FactGenerator : public ConstStmtVisitor<FactGenerator> {
}
}
+ void Visit(const Stmt *S) {
+ if (!S || !VisitedStmts.insert(S).second)
+ return;
+ Base::Visit(S);
+ }
+
void VisitDeclStmt(const DeclStmt *DS) {
for (const Decl *D : DS->decls())
if (const auto *VD = dyn_cast<VarDecl>(D))
@@ -551,6 +558,7 @@ class FactGenerator : public ConstStmtVisitor<FactGenerator> {
FactManager &FactMgr;
AnalysisDeclContext &AC;
llvm::SmallVector<Fact *> CurrentBlockFacts;
+ llvm::DenseSet<const Stmt *> VisitedStmts;
};
// ========================================================================= //
diff --git a/clang/unittests/Analysis/LifetimeSafetyTest.cpp b/clang/unittests/Analysis/LifetimeSafetyTest.cpp
index c8d88b4ea2277..13e5832d70050 100644
--- a/clang/unittests/Analysis/LifetimeSafetyTest.cpp
+++ b/clang/unittests/Analysis/LifetimeSafetyTest.cpp
@@ -11,6 +11,7 @@
#include "clang/ASTMatchers/ASTMatchers.h"
#include "clang/Testing/TestAST.h"
#include "llvm/ADT/StringMap.h"
+#include "llvm/Testing/Support/Error.h"
#include "gmock/gmock.h"
#include "gtest/gtest.h"
#include <optional>
@@ -20,6 +21,7 @@ namespace clang::lifetimes::internal {
namespace {
using namespace ast_matchers;
+using ::testing::SizeIs;
using ::testing::UnorderedElementsAreArray;
// A helper class to run the full lifetime analysis on a piece of code
@@ -96,21 +98,18 @@ class LifetimeTestHelper {
return OID;
}
- std::optional<LoanID> getLoanForVar(llvm::StringRef VarName) {
+ std::vector<LoanID> getLoansForVar(llvm::StringRef VarName) {
auto *VD = findDecl<VarDecl>(VarName);
- if (!VD)
- return std::nullopt;
+ if (!VD) {
+ ADD_FAILURE() << "No VarDecl found for '" << VarName << "'";
+ return {};
+ }
std::vector<LoanID> LID = Analysis.getLoanIDForVar(VD);
if (LID.empty()) {
ADD_FAILURE() << "Loan for '" << VarName << "' not found.";
- return std::nullopt;
- }
- // TODO: Support retrieving more than one loans to a var.
- if (LID.size() > 1) {
- ADD_FAILURE() << "More than 1 loans found for '" << VarName;
- return std::nullopt;
+ return {};
}
- return LID[0];
+ return LID;
}
std::optional<LoanSet> getLoansAtPoint(OriginID OID,
@@ -121,13 +120,12 @@ class LifetimeTestHelper {
return Analysis.getLoansAtPoint(OID, PP);
}
- std::optional<llvm::DenseSet<LoanID>>
+ std::optional<std::vector<LoanID>>
getExpiredLoansAtPoint(llvm::StringRef Annotation) {
ProgramPoint PP = Runner.getProgramPoint(Annotation);
if (!PP)
return std::nullopt;
- auto Expired = Analysis.getExpiredLoansAtPoint(PP);
- return llvm::DenseSet<LoanID>{Expired.begin(), Expired.end()};
+ return Analysis.getExpiredLoansAtPoint(PP);
}
private:
@@ -197,12 +195,13 @@ MATCHER_P2(HasLoansToImpl, LoanVars, Annotation, "") {
std::vector<LoanID> ExpectedLoans;
for (const auto &LoanVar : LoanVars) {
- std::optional<LoanID> ExpectedLIDOpt = Info.Helper.getLoanForVar(LoanVar);
- if (!ExpectedLIDOpt) {
+ std::vector<LoanID> ExpectedLIDs = Info.Helper.getLoansForVar(LoanVar);
+ if (ExpectedLIDs.empty()) {
*result_listener << "could not find loan for var '" << LoanVar << "'";
return false;
}
- ExpectedLoans.push_back(*ExpectedLIDOpt);
+ ExpectedLoans.insert(ExpectedLoans.end(), ExpectedLIDs.begin(),
+ ExpectedLIDs.end());
}
return ExplainMatchResult(UnorderedElementsAreArray(ExpectedLoans),
@@ -221,17 +220,17 @@ MATCHER_P(AreExpiredAt, Annotation, "") {
<< Annotation << "'";
return false;
}
- std::vector<LoanID> ActualExpiredLoans(ActualExpiredSetOpt->begin(),
- ActualExpiredSetOpt->end());
+ std::vector<LoanID> ActualExpiredLoans = *ActualExpiredSetOpt;
std::vector<LoanID> ExpectedExpiredLoans;
for (const auto &VarName : Info.LoanVars) {
- auto LoanIDOpt = Helper.getLoanForVar(VarName);
- if (!LoanIDOpt) {
+ auto LoanIDs = Helper.getLoansForVar(VarName);
+ if (LoanIDs.empty()) {
*result_listener << "could not find a loan for variable '" << VarName
<< "'";
return false;
}
- ExpectedExpiredLoans.push_back(*LoanIDOpt);
+ ExpectedExpiredLoans.insert(ExpectedExpiredLoans.end(), LoanIDs.begin(),
+ LoanIDs.end());
}
return ExplainMatchResult(UnorderedElementsAreArray(ExpectedExpiredLoans),
ActualExpiredLoans, result_listener);
@@ -730,5 +729,17 @@ TEST_F(LifetimeAnalysisTest, ReassignedPointerThenOriginalExpires) {
EXPECT_THAT(LoansTo({"s1", "s2"}), AreExpiredAt("p_after_s1_expires"));
}
+TEST_F(LifetimeAnalysisTest, NoDuplicateLoansForImplicitCastToConst) {
+ SetupTest(R"(
+ void target() {
+ MyObj a;
+ const MyObj* p = &a;
+ const MyObj* q = &a;
+ POINT(at_end);
+ }
+ )");
+ EXPECT_THAT(Helper->getLoansForVar("a"), SizeIs(2));
+}
+
} // anonymous namespace
} // namespace clang::lifetimes::internal
|
@llvm/pr-subscribers-clang Author: Utkarsh Saxena (usx95) ChangesThis change adds a Fixes #153592 Full diff: https://github.com/llvm/llvm-project/pull/153661.diff 2 Files Affected:
diff --git a/clang/lib/Analysis/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety.cpp
index c762f63c45e09..86d9517dde45c 100644
--- a/clang/lib/Analysis/LifetimeSafety.cpp
+++ b/clang/lib/Analysis/LifetimeSafety.cpp
@@ -396,6 +396,7 @@ class FactGenerator : public ConstStmtVisitor<FactGenerator> {
// initializations and destructions are processed in the correct sequence.
for (const CFGBlock *Block : *AC.getAnalysis<PostOrderCFGView>()) {
CurrentBlockFacts.clear();
+ VisitedStmts.clear();
for (unsigned I = 0; I < Block->size(); ++I) {
const CFGElement &Element = Block->Elements[I];
if (std::optional<CFGStmt> CS = Element.getAs<CFGStmt>())
@@ -408,6 +409,12 @@ class FactGenerator : public ConstStmtVisitor<FactGenerator> {
}
}
+ void Visit(const Stmt *S) {
+ if (!S || !VisitedStmts.insert(S).second)
+ return;
+ Base::Visit(S);
+ }
+
void VisitDeclStmt(const DeclStmt *DS) {
for (const Decl *D : DS->decls())
if (const auto *VD = dyn_cast<VarDecl>(D))
@@ -551,6 +558,7 @@ class FactGenerator : public ConstStmtVisitor<FactGenerator> {
FactManager &FactMgr;
AnalysisDeclContext &AC;
llvm::SmallVector<Fact *> CurrentBlockFacts;
+ llvm::DenseSet<const Stmt *> VisitedStmts;
};
// ========================================================================= //
diff --git a/clang/unittests/Analysis/LifetimeSafetyTest.cpp b/clang/unittests/Analysis/LifetimeSafetyTest.cpp
index c8d88b4ea2277..13e5832d70050 100644
--- a/clang/unittests/Analysis/LifetimeSafetyTest.cpp
+++ b/clang/unittests/Analysis/LifetimeSafetyTest.cpp
@@ -11,6 +11,7 @@
#include "clang/ASTMatchers/ASTMatchers.h"
#include "clang/Testing/TestAST.h"
#include "llvm/ADT/StringMap.h"
+#include "llvm/Testing/Support/Error.h"
#include "gmock/gmock.h"
#include "gtest/gtest.h"
#include <optional>
@@ -20,6 +21,7 @@ namespace clang::lifetimes::internal {
namespace {
using namespace ast_matchers;
+using ::testing::SizeIs;
using ::testing::UnorderedElementsAreArray;
// A helper class to run the full lifetime analysis on a piece of code
@@ -96,21 +98,18 @@ class LifetimeTestHelper {
return OID;
}
- std::optional<LoanID> getLoanForVar(llvm::StringRef VarName) {
+ std::vector<LoanID> getLoansForVar(llvm::StringRef VarName) {
auto *VD = findDecl<VarDecl>(VarName);
- if (!VD)
- return std::nullopt;
+ if (!VD) {
+ ADD_FAILURE() << "No VarDecl found for '" << VarName << "'";
+ return {};
+ }
std::vector<LoanID> LID = Analysis.getLoanIDForVar(VD);
if (LID.empty()) {
ADD_FAILURE() << "Loan for '" << VarName << "' not found.";
- return std::nullopt;
- }
- // TODO: Support retrieving more than one loans to a var.
- if (LID.size() > 1) {
- ADD_FAILURE() << "More than 1 loans found for '" << VarName;
- return std::nullopt;
+ return {};
}
- return LID[0];
+ return LID;
}
std::optional<LoanSet> getLoansAtPoint(OriginID OID,
@@ -121,13 +120,12 @@ class LifetimeTestHelper {
return Analysis.getLoansAtPoint(OID, PP);
}
- std::optional<llvm::DenseSet<LoanID>>
+ std::optional<std::vector<LoanID>>
getExpiredLoansAtPoint(llvm::StringRef Annotation) {
ProgramPoint PP = Runner.getProgramPoint(Annotation);
if (!PP)
return std::nullopt;
- auto Expired = Analysis.getExpiredLoansAtPoint(PP);
- return llvm::DenseSet<LoanID>{Expired.begin(), Expired.end()};
+ return Analysis.getExpiredLoansAtPoint(PP);
}
private:
@@ -197,12 +195,13 @@ MATCHER_P2(HasLoansToImpl, LoanVars, Annotation, "") {
std::vector<LoanID> ExpectedLoans;
for (const auto &LoanVar : LoanVars) {
- std::optional<LoanID> ExpectedLIDOpt = Info.Helper.getLoanForVar(LoanVar);
- if (!ExpectedLIDOpt) {
+ std::vector<LoanID> ExpectedLIDs = Info.Helper.getLoansForVar(LoanVar);
+ if (ExpectedLIDs.empty()) {
*result_listener << "could not find loan for var '" << LoanVar << "'";
return false;
}
- ExpectedLoans.push_back(*ExpectedLIDOpt);
+ ExpectedLoans.insert(ExpectedLoans.end(), ExpectedLIDs.begin(),
+ ExpectedLIDs.end());
}
return ExplainMatchResult(UnorderedElementsAreArray(ExpectedLoans),
@@ -221,17 +220,17 @@ MATCHER_P(AreExpiredAt, Annotation, "") {
<< Annotation << "'";
return false;
}
- std::vector<LoanID> ActualExpiredLoans(ActualExpiredSetOpt->begin(),
- ActualExpiredSetOpt->end());
+ std::vector<LoanID> ActualExpiredLoans = *ActualExpiredSetOpt;
std::vector<LoanID> ExpectedExpiredLoans;
for (const auto &VarName : Info.LoanVars) {
- auto LoanIDOpt = Helper.getLoanForVar(VarName);
- if (!LoanIDOpt) {
+ auto LoanIDs = Helper.getLoansForVar(VarName);
+ if (LoanIDs.empty()) {
*result_listener << "could not find a loan for variable '" << VarName
<< "'";
return false;
}
- ExpectedExpiredLoans.push_back(*LoanIDOpt);
+ ExpectedExpiredLoans.insert(ExpectedExpiredLoans.end(), LoanIDs.begin(),
+ LoanIDs.end());
}
return ExplainMatchResult(UnorderedElementsAreArray(ExpectedExpiredLoans),
ActualExpiredLoans, result_listener);
@@ -730,5 +729,17 @@ TEST_F(LifetimeAnalysisTest, ReassignedPointerThenOriginalExpires) {
EXPECT_THAT(LoansTo({"s1", "s2"}), AreExpiredAt("p_after_s1_expires"));
}
+TEST_F(LifetimeAnalysisTest, NoDuplicateLoansForImplicitCastToConst) {
+ SetupTest(R"(
+ void target() {
+ MyObj a;
+ const MyObj* p = &a;
+ const MyObj* q = &a;
+ POINT(at_end);
+ }
+ )");
+ EXPECT_THAT(Helper->getLoansForVar("a"), SizeIs(2));
+}
+
} // anonymous namespace
} // namespace clang::lifetimes::internal
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if this is the right approach. If everything works out well, every time we call Visit
on an expression, there should be a guarantee we already visited all the subexpressions of it (modulo some corner cases with short circuiting operators, ternaries and trivially false branches). So, we might be able to structure the code in a way that we never need to call Visit
recursively for a subexpression, and we do not need to keep a VisitedStmts
set.
But in case that does not work out for some reason I am also fine with this approach.
3bb7fca
to
546483e
Compare
546483e
to
f11f942
Compare
Yes this is the ideal scenario. But we visit the statements given by the CFG and this can, more often than not, skip many sub expressions. I don't have a great solution. An alternate could be to use a Recursive AST visitor (RAV) along that drives our FactGenerator stmt visitor. The RAV would still need to maintain a |
Could you elaborate on why?
Hmm, there is a change that this could be a bit simpler than the current solution. If you can give this a try to see where does that lead it would be awesome. |
5ccc210
to
dbb99ed
Compare
The CFG can contain sub expressions as individual CFG statements. Later parent expressions can again appear as independent CFGStmt. It is not guaranteed that all subexpressions of all expr types appear as CFGStmt. So we recursively visit each CFGStmt and skip already visited Stmt which might have appeared before in CFG. |
dbb99ed
to
2f329e7
Compare
Switched to using RAV for driving the FactGenerator @Xazax-hun |
2eac6ae
to
3b3b93b
Compare
3b3b93b
to
440c3fe
Compare
Thanks! I am not actually sure if this one is simpler than the original version. Sorry about that. I don't have a strong feeling which one to pick. I guess the complexity here is an inevitable consequence of our CFGs being just an overlay on top of the AST and it is not entirely clear when should we use one or the other for traversals because it depends on what we are processing. Let me know which one do you prefer (this or the previous), and I am happy to approve either of them. |
This does introduce an extra layer of orchestration but I like the current approach. The
+1 |
Thanks! I have one more question. So we need the reverse post order property of the CFG to ensure that we visit everything in the right order. The only reason why I am a bit unsure here, is because I wonder if it can ever cause some trouble that the RAV visiting the subexpressions is crossing a basic block boundary and venturing into that block too early, before all of its predecessors have been visited. If this cannot happen, I think the current approach should be fine. |
This PR refactors the FactGenerator class to use RecursiveASTVisitor for traversing statements, which helps avoid duplicate fact generation for nodes that appear multiple times in the AST. The implementation:
Fixes #153592